| 
 Sains Malaysiana 52(10)(2023): 2971-2983
                
           http://doi.org/10.17576/jsm-2023-5210-18
            
           
             
           Classifying
            Severity of Unhealthy Air Pollution Events in Malaysia: A Decision Tree Model
            
           (Mengelaskan Keparahan Kejadian Pencemaran Udara Tidak Sihat di Malaysia: Hasil Model Pokok Keputusan)
                
           
             
           NURULKAMAL
            MASSERAN1,*, RAZIK RIDZUAN MOHD
              TAJUDDIN1 & MOHD TALIB LATIF2,3
  
 
             
           1Department of Mathematical Sciences, Faculty of
            Science and Technology
            
           Universiti Kebangsaan Malaysia, 43600
            UKM Bangi, Selangor, Malaysia
  
           2Department of Earth Sciences and Environment, Faculty
            of Science and Technology
            
           Universiti Kebangsaan Malaysia, 43600
            UKM Bangi, Selangor, Malaysia
  
           3Department of Environmental Health, Faculty of Public
            Health, Universitas Airlangga,
            Surabaya, East Java 60115, Indonesia
  
           
             
           Received: 16 June 2023/Accepted: 2 October 2023
            
           
             
           Abstract
            
           The application of
            data mining technique in dealing with real problems is popular and ubiquitous
            in various knowledge domains. This study proposes the concept of severity
            measures correspond to the characteristics of duration and intensity size for
            evaluating unhealthy air pollution events. In parallel with that, the present
            study also proposes a decision tree as a predictive model to deal with a binary
            classification corresponding to extreme and non-extreme unhealthy air pollution
            events, which is established based on threshold of the power-law behavior. In a
            similar vein, other characteristics, such as duration and intensity size, were
            also determined as important related features. A case study was conducted using
            the air pollution index data of Klang, Malaysia, from
            January 1st, 1997 to August 31st, 2020. The results found
            that the decision tree model can provide a high degree of precision and
            generalization with 100% accuracy in classifying a class for extreme and
            non-extreme events for the air pollution severity in the Klang area. In addition, a duration size is the most influential feature that leads
            to the occurrence of an extreme air pollution event. Thus, this study also
            suggests that authorities should exercise some vigilance precautions with
            respect to pollution incidents with a consecutive duration exceeding 11 hours.
  
           
             
           Keywords: Air
            pollution classification; data mining; extreme air pollution; predictive model
            
           
             
           Abstrak
            
           Pengaplikasian teknik perlombongan data dalam menangani masalah dunia
            sebenar adalah popular dalam pelbagai domain pengetahuan. Kajian ini
            mengusulkan konsep ukuran keparahan sepadan dengan ciri tempoh masa dan saiz
            keamatan untuk menilai kejadian pencemaran udara yang tidak sihat. Selari
            dengan itu, kajian ini juga mengusulkan kaedah pokok keputusan sebagai model
            ramalan bagi kes pengelasan binari terhadap kejadian pencemaran udara tidak
            sihat yang melampau dan tidak melampau yang boleh dikenal pasti berdasarkan nilai
            ambang tingkah laku hukum-kuasa. Di samping itu, ciri lain iaitu tempoh masa
            dan saiz keamatan, juga dikenal pasti sebagai ciri berkaitan yang penting bagi
            suatu kes pencemaran udara. Dalam kajian ini, kajian kes telah dijalankan
            menggunakan data indeks pencemaran udara di Klang, Malaysia, dari  1 Januari 1997 hingga 31 Ogos 2020. Hasil
            kajian mendapati model pokok hasil dapat memberikan tahap ketepatan dan
            pengitlakan yang tinggi dengan ketepatan 100% dalam mengelaskan kelas bagi
            kejadian pencemaran melampau dan tidak melampau merujuk kepada keparahan suatu
            pencemaran udara di kawasan Klang. Selain itu, saiz tempoh masa dikenal pasti
            sebagai adalah ciri berpengaruh yang membawa kepada berlakunya kejadian
            pencemaran udara yang melampau. Oleh itu, kajian ini juga mencadangkan bahawa
            pihak berkuasa harus melaksanakan beberapa langkah berjaga-jaga jika kejadian
            pencemaran udara didapati berlaku dalam tempoh berturut-turut melebihi 11 jam.
            
           
             
           Kata kunci: Model peramal; pencemaran udara melampau; pengelasan pencemaran
            udara; perlombongan data
            
           
             
           REFERENCES
            
           Agathokleous, E. & Saitanis, C.J. 2020. Plant susceptibility to ozone: A tower
            of Babel? Sci. Total Environ. 703:
            134962.
  
           Agathokleous, E., Feng, Z.
  & Saitanis, C.J. 2022. Effects of Ozone on Forests. In Handbook
    of Air Quality and Climate Change, edited by Akimoto, H. & Tanimoto, H. Singapore: Springer.
  
           Aggarwal,
            C. 2015. Data Mining. Cham: Springer.
  
           Al-Kindi, S.G., Brook, R.D., Biswal,
            S. & Rajagopalan, S. 2020. Environmental
            determinants of cardiovascular disease: Lessons learned from air pollution. Nat. Rev. Cardiol. 17: 656-672.
  
           Bakar,
            M.A.A., Ariff, N.M., Bakar, S.A., Chi, G.P. & Rajendran, R. 2022. Air quality forecasting using temporal
            convolutional network (TCN) deep learning method. Sains Malaysiana 51(11): 3785-3793.
  
           Bekesiene, S., Meidute-Kavaliauskiene, I. & Vasiliauskiene,
            V. 2021. Accurate prediction of concentration changes in ozone as an air
            pollutant by multiple linear regression and artificial neural networks. Mathematics 9(4): 356.
  
           Boehmke, B. &
            Greenwell, B. 2020. Hands-on Machine
              Learning with R. Boca Raton: Chapman & Hall/CRC.
  
           Breiman, L. 2001. Random
            Forests. Mach. Learn. 45: 5-32.
  
           Breiman, L. 1996. Bagging
            predictors. Mach. Learn. 24: 123-140.
  
           Breiman, L. 1984. Classification and Regression Tree. Boca
            Raton: Chapman & Hall/CRC.
  
           Brønnum-Hansena, H., Bender, A.M.,
            Andersen, Z.J., Sørensen, J., Bønløkke,
            J.H., Boshuizen, H., Becker, T., Diderichsen,
            F. & Loft, S. 2018. Assessment of impact of traffic-related air pollution
            on morbidity and mortality in Copenhagen Municipality and the health gain of
            reduced exposure. Environ. Int. 121(Part 1): 973-980.
  
           Cabaneros, S.M., Calautit, J.K. & Hughes, B.R. 2019. A review of
            artificial neural network models for ambient air pollution prediction. Environ. Model. Softw. 119: 285-304.
  
           Chang,
            L-Y. & Wang, H-W. 2006. Analysis of traffic injury severity: An application
            of non-parametric classification tree techniques. Accid. Anal. Prev. 38(5): 1019-1027.
  
           Chau,
            T.T. & Wang, K.Y. 2020. An association between air pollution and daily most
            frequently visits of eighteen outpatient diseases in an industrial city. Sci. Rep. 10: 2321.
  
           Cohen,
            S., Rokach, L. & Maimon,
            O. 2007. Decision-tree instance-space decomposition with grouped gain-ratio. Inf. Sci. 177(17): 3592-3612.
  
           Delen, D., Kuzey,
            C. & Uyar, A. 2013. Measuring firm performance
            using financial ratios: A decision tree approach. Expert Syst. Appl. 40(10): 3970-3983.
  
           Department
            of Environment. 1997. A Guide to Air
              Pollutant Index in Malaysia (API). Kuala Lumpur: Ministry of Science,
            Technology and the Environment. https://aqicn.org/images/aqi-scales/malaysia-api-guide.pdf
  
           Emberson, L. 2020. Effects
            of ozone on agriculture, forests and grasslands. Philos. Trans. Royal Soc. A. 378(2183): 20190327.
  
           Feldman,
            D. & Gross, S. 2005. Mortgage default: Classification trees analysis. J. Real Estate Finan.
              Econ. 30: 369-396.
  
           Friedman,
            J.H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29(5): 1189-1232.
  
           Gin,
            O.K. 2009. Historical Dictionary of
              Malaysia. Lanham: Scarecrow Press.
  
           Haldorai, A. & Ramu, A. 2021. Canonical correlation analysis based hyper
            basis feedforward neural network classification for urban sustainability. Neural Process. Lett. 53: 2385-2401.
  
           Hodge,
            V. & Austin, J. 2004. A survey of outlier detection methodologies. Artif. Intell. Rev. 22: 85-126.
  
           Hvidtfeldt, U.A., Severi, G., Andersen, Z.J., Atkinson, R., Bauwelinck, M., Bellander, T., Boutron-Ruault, M-C., Brandt, J., Brunekreef,
            B., Cesaroni, G., Chen, J., Concin,
            H., Forastiere, F., van Gils, C.H., Gulliver, J., Hertel, O., Hoek, G., Hoffmann, B., de Hoogh,
            K., Janssen, N., Jöckel, K.H., Jørgensen,
            J.T., Katsouyanni, K., Ketzel,
            M., Klompmaker, J.O., Krog,
            N.H., Lang, A., Leander, K., Liu, S., Ljungman,
            P.L.S., Magnusson, P.K.E., Mehta, A.J., Nagel, G., Oftedal,
            B., Pershagen, G., Peter, R.S., Peters, A., Renzi,
            M., Rizzuto, D., Rodopoulou, S., Samoli,
            E., Schwarze, P.E., Sigsgaard,
            T., Simonsen, M.K., Stafoggia,
            M., Strak, M., Vienneau,
            D., Weinmayr, G., Wolf, K., Raaschou-Nielsen,
            O. & Fecht, D. 2021. Long-term low-level ambient
            air pollution exposure and risk of lung cancer - A pooled analysis of 7
            European cohorts. Environ. Int. 146:
            106249.
  
           James,
            G., Witten, D., Hastie, T. & Tibshirani, R. 2013. An Introduction to Statistical Learning
              with Application in R. New York: Springer.
  
           Kamiran, F., Calders, T. & Pechenizkiy, M.
            2013. Techniques for Discrimination-Free
              Predictive Models. In Discrimination
                and Privacy in the Information Society. Studies in Applied Philosophy,
                Epistemology and Rational Ethics, vol 3, edited
            by Custers, B., Calders,
            T., Schermer, B. & Zarsky,
            T. Berlin: Springer.
  
           Kow, P-Y., Chang, L-C., Lin, C-Y.,
            Chou, C.C-K. & Chang, F-J. 2022.  Deep neural networks for spatiotemporal PM2.5 forecasts based
            on atmospheric chemical transport model output and monitoring data. Environ. Pollut. 306: 119348.
  
           Kumar,
            S., Mishra, A.K. & Choudhary, B.S. 2022.
            Prediction of back break in blasting using random decision trees. Eng. Comput. 38: 1185-1191.
  
           Lantz,
            B. 2019. Machine Learning with R: Expert
              Techniques for Predictive Modeling. 3rd ed. Birmingham: Packt Publishing.
  
           Lanzi, E., Dellink,
            R. & Chateau, J. 2018. The sectoral and regional economic consequences of
            outdoor air pollution to 2060. Energy
              Econ. 71: 89-113.
  
           Lu,
            J.G. 2020. Air pollution: A systematic review of its psychological, economic,
            and social effects. Curr. Opin. Psychol. 32: 52-65.
  
           Maimon, O. & Rokach, L. 2009. Introduction
            to knowledge discovery and data mining. In Data Mining and Knowledge Discovery Handbook, edited by Maimon, O. & Rokach, L.
            Boston: Springer.
  
           Maji, S., Ghosh, S. & Ahmed, S.
            2018. Association of air quality with respiratory and cardiovascular morbidity
            rate in Delhi, India. Int. J. Environ.
              Health Res. 28(5): 471-490.
  
           Malik,
            S., Kanwal, N., Asghar,
            M.N., Sadiq, M.A.A., Karamat,
            I. & Fleury, M. 2019. Data driven approach for eye disease classification
            with machine learning. Appl. Sci. 9:
            2789.
  
           Masseran, N. 2022a.
            Power-law behaviors of the severity of unhealthy air pollution events. Nat. Hazards 112: 1749-1766.
  
           Masseran, N. 2022b.
            Multifractal characteristics on multiple pollution variables in Malaysia. Bull. Malaysian Math. Sci. Soc. 45:
            325-344.
  
           Masseran, N. 2021a.
            Power-law behaviors of the duration size of unhealthy air pollution events. Stoch. Environ. Res. Risk Asses. 35:
            1499-1508.
  
           Masseran, N. 2021b.
            Modeling the characteristics of unhealthy air pollution events: A copula
            approach. Int. J. Environ. Res. Public
              Health 18(16): 8751.
  
           Masseran, N. 2017.
            Modeling fluctuation of PM10 data with existence of volatility
            effect. Environ. Eng. Sci 34(11): 816-827.
  
           Masseran, N. & Safari,
            M.A.M. 2020. Risk assessment of extreme air pollution based on partial duration
            series: IDF approach. Stoch. Environ. Res. Risk Asses. 34: 545-559.
  
           Masui,
            N., Agathokleous, E., Mochizuki, T., Tani, A., Matsuura, H. & Koike, T. 2021. Ozone disrupts
            the communication between plants and insects in urban and suburban areas: An
            updated insight on plant volatiles. J.
              For. Res. 32: 1337-1349.
  
           McCarthy,
            R.V., McCarthy, M.M., Ceccucci, W. & Halawi, L. 2019. Applying
              Predictive Analytics. Cham: Springer.
  
           Mustakim, N.A., Ul-Saufie, A.Z., Shaziayani,
            W.N., Mohamad Noor, N. & Mutalib, S. 2023.
            Prediction of daily air pollutants concentration and air pollutant index using
            machine learning approach. Pertanika J. Sci.
  & Technol. 31(1): 123-135.
  
           Myles,
            A.J., Feudale, R.N., Liu, Y., Woody, N.A. &
            Brown, S.D. 2004. An introduction to decision tree modeling. J. Chemom. 18(6): 275-285.
  
           Ndong, G.O., Villerd,
            J., Cousin, I. & Therond, O. 2021. Using a
            multivariate regression tree to analyze trade-offs between ecosystem services:
            Application to the main cropping area in France. Sci. Total Environ. 764: 142815.
  
           Ouyang,
            X., Shao, Q., Zhu, X., He, Q., Xiang, C. & Wei, G. 2019. Environmental
            regulation, economic growth and air pollution: Panel threshold analysis for
            OECD countries. Sci. Total Environ. 657: 234-241.
  
           Putra,
            F.M. & Sitanggang, I.S. 2020. Classification
            model of air quality in Jakarta using decision tree algorithm based on air
            pollutant standard index. IOP Conf. Ser.: Earth Environ. Sci. 528:
            012053.
  
           Raileanu, L.E. & Stoffel, K. 2004. Theoretical comparison between the Gini
            Index and information gain criteria. Ann.
              Math. Artif. Intell. 41: 77-93.
  
           Rizvi,
            S., Rienties, B. & Khoja, S.A. 2019. The role of
            demographics in online learning; A decision tree based approach. Comput. Educ. 137: 32-47.
  
           Rokach, L. & Maimon, O. 2015. Data Mining with Decision Trees: Theory and Applications. 2nd ed.
            Singapore: World Scientific Publishing.
            
           Rokach, L. & Maimon, O. 2009. Classification
            trees. In Data Mining and
              Knowledge Discovery Handbook, edited by Maimon,
            O. & Rokach, L. Boston: Springer.
  
           Rokach, L. & Maimon, O. 2005. Decision
            trees. In Data Mining and
              Knowledge Discovery Handbook, edited by Maimon,
            O. & Rokach, L. Boston: Springer.
  
           Rokach, L. & Maimon, O. 2005. Top-down induction of decision trees
            classifiers - A survey. IEEE Trans. Syst.
              Man. Cybern. B Cybern. 35(4):
            476-487.
  
           Sanyal, S., Rochereau, T., Maesano, C.N.,
            Com-Ruelle, L. & Annesi-Maesano,
            I. 2018. Long-term effect of outdoor air pollution on mortality and morbidity:
            A 12-year follow-up study for metropolitan France. Int. J. Environ. Res.
              Public Health 15(11): 2487.
  
           Sarkhosh, M., Najafpoor, A.A., Alidadi, H., Shamsara, J., Amiri, H., Andrea,
            T. & Kariminejad, F. 2021. Indoor air quality
            associations with sick building syndrome: An application of decision tree
            technology. Build. Environ. 188:
            107446.
  
           Schapire, R.E. &
            Freund, Y. 2013. Boosting: Foundations and Algorithms. Kybernetes 42(1): 164-166.
  
           Schraufnagel, D.E., Balmes, J.R., Cowl, C.T., Matteis,
            S.D., Jung, S-H., Mortimer, K., Perez-Padilla, R., Rice, M.B.,
            Riojas-Rodriguez, H., Sood, A., Thurston, G.D., To,
            T., Vanker, A. & Wuebbles,
            D.J. 2019. Air pollution and noncommunicable diseases: A review by the Forum of International Respiratory Societies’
            Environmental Committee, Part 2: Air pollution and organ systems. CHEST 155(2): 417-426.
  
           Shaziayani, W.N., Ul-Saufie, A.Z., Mutalib, S.,
            Mohamad Noor, N. & Zainordin, N.S. 2022.
            Classification prediction of PM10 concentration using a tree-based
            machine learning approach. Atmosphere 13: 538.
  
           Tan,
            P-G., Steinbach, M., Karpatne, A. & Kumar,
            V.  2019. Introduction to Data Mining. 2
              ed. Boston: Pearson Education.
  
           Tileubai, A., Tsend, J., Oyunbileg, B-E., Luvsantseren, P., Luvsan-Ish, A., Chilhaasuren, B., Puntsagdash,
            J., Chuluunbaatar, G. & Tsagaan,
            B. 2023. Study of decision tree algorithms: Effects of air pollution on under
            five mortality in Ulaanbaatar. BMJ Health Care Inform. 30: e100678.
  
           Thongtip, S., Srivichai, P., Chaitiang, N.
  & Tantrakarnapa, K. 2022. The influence of air
            pollution on disease and related health problems in Northern Thailand. Sains Malaysiana 51(7): 1993-2002.
  
           Wang,
            C., Feng, L. & Chen, K. 2019. The impact of ambient particulate matter on
            hospital outpatient visits for respiratory and circulatory system disease in an
            urban Chinese population. Sci. Total
              Environ. 666: 672-679.
  
           Wang,
            N., Mengersen, K., Tong, S., Kimlin,
            M., Zhou, M., Wang, L., Yin, P., Xua, Z., Cheng, J.,
            Zhang, Y. & Hu, W. 2019. Short-term association between ambient air
            pollution and lung cancer mortality. Environ.
              Res. 179(Part A): 108748.
  
           Zalakeviciute, R., Bastidas, M., Buenaño, A. & Rybarczyk, Y.A. 2020. Traffic-based method to predict and
            map urban air quality. Appl. Sci. 10:
            2035.
  
           Zhang,
            Y., Zhang, R., Ma, Q., Wang, Y., Wang, Q., Huang, Z. & Huang, L. 2020. A
            feature selection and multi-model fusion-based approach of predicting air
            quality. ISA Trans. 100: 210-220.
  
           Zhao,
            C-N., Xu, Z., Wu, G-C., Mao, Y-M., Liu, L-N., Wu, Q., Dan, Y-L., Tao, S-S.,
            Zhang, Q., Sam, N.B., Fan, Y-G., Zou, Y-F., Ye, D-Q. & Pan, H-F. 2019.
            Emerging role of air pollution in autoimmune diseases. Autoimmun. Rev. 18(6): 607-614.
  
           Zhao,
            H., Zheng, Y. & Wu, X. 2018. Assessment of yield and economic losses for
            wheat and rice due to ground-level O3 exposure in the Yangtze River
            Delta, China. Atmos. Environ. 191:
            241-248.
  
           Zhao,
            H., Zhang, Y., Qi, Q. & Zhang, H. 2021. Evaluating the impacts of
            ground-level O3 on crops in China. Curr. Pollution Rep. 7: 565-578.
  
           
             
           *Corresponding author; email: kamalmsn@ukm.edu.my
            
           
             
               
             
           
            
           
           
           
           
           
                 
              
           
            
            
               
             |